智能论文笔记

CVSS-BERT: Explainable Natural Language Processing to Determine the Severity of a Computer Security Vulnerability from its Description

Mustafizur Shahid , Hervé Debar

分类：自然语言处理 | 人工智能 | 机器学习

2021-11-16

当新的计算机安全漏洞被公开披露时，只有一个文本描述。网络安全专家后来提供了使用常见漏洞评分系统（CVSS）的脆弱性严重性分析。具体地，漏洞的不同特征总结成矢量（由一组度量组成），从中计算严重性得分。但是，由于每天披露的漏洞数量大，此过程需要大量的人力，并且在分析漏洞之前，几天可能会通过。我们建议利用自然语言处理领域的最近进步（NLP）来确定CVSS向量和以可说明的方式从其文本描述中的漏洞的相关性严重性得分。为此目的，我们培训了多个BERT分类器，一个用于组成CVSS向量的每个度量。实验结果表明，我们的训练有素的分类器能够以高精度确定CVSS矢量的度量值。从预测的CVSS向量计算的严重性分数也非常接近人类专家归属的真实严重性分数。为了说明目的，基于梯度的输入显着性方法用于确定我们的分类器制作的给定预测的最相关的输入单词。通常，最重要的词语包括与人类社会安全专家的理由同意的术语，使最终用户的解释能够理解。

translated by 谷歌翻译

Generalizable Natural Language Processing Framework for Migraine Reporting from Social Media

Yuting Guo , Swati Rajwal , Sahithi Lakamana , Chia-Chun Chiang , Paul C. Menell , Adnan H. Shahid , Yi-Chieh Chen , Nikita Chhabra , Wan-Ju Chao , Chieh-Ju Chao

分类：自然语言处理

2022-12-23

Migraine is a high-prevalence and disabling neurological disorder. However, information migraine management in real-world settings could be limited to traditional health information sources. In this paper, we (i) verify that there is substantial migraine-related chatter available on social media (Twitter and Reddit), self-reported by migraine sufferers; (ii) develop a platform-independent text classification system for automatically detecting self-reported migraine-related posts, and (iii) conduct analyses of the self-reported posts to assess the utility of social media for studying this problem. We manually annotated 5750 Twitter posts and 302 Reddit posts. Our system achieved an F1 score of 0.90 on Twitter and 0.93 on Reddit. Analysis of information posted by our 'migraine cohort' revealed the presence of a plethora of relevant information about migraine therapies and patient sentiments associated with them. Our study forms the foundation for conducting an in-depth analysis of migraine-related information using social media data.

translated by 谷歌翻译

An ensemble neural network approach to forecast Dengue outbreak based on climatic condition

Madhurima Panja , Tanujit Chakraborty , Sk Shahid Nadim , Indrajit Ghosh , Uttam Kumar , Nan Liu

分类：机器学习

2022-12-16

Dengue fever is a virulent disease spreading over 100 tropical and subtropical countries in Africa, the Americas, and Asia. This arboviral disease affects around 400 million people globally, severely distressing the healthcare systems. The unavailability of a specific drug and ready-to-use vaccine makes the situation worse. Hence, policymakers must rely on early warning systems to control intervention-related decisions. Forecasts routinely provide critical information for dangerous epidemic events. However, the available forecasting models (e.g., weather-driven mechanistic, statistical time series, and machine learning models) lack a clear understanding of different components to improve prediction accuracy and often provide unstable and unreliable forecasts. This study proposes an ensemble wavelet neural network with exogenous factor(s) (XEWNet) model that can produce reliable estimates for dengue outbreak prediction for three geographical regions, namely San Juan, Iquitos, and Ahmedabad. The proposed XEWNet model is flexible and can easily incorporate exogenous climate variable(s) confirmed by statistical causality tests in its scalable framework. The proposed model is an integrated approach that uses wavelet transformation into an ensemble neural network framework that helps in generating more reliable long-term forecasts. The proposed XEWNet allows complex non-linear relationships between the dengue incidence cases and rainfall; however, mathematically interpretable, fast in execution, and easily comprehensible. The proposal's competitiveness is measured using computational experiments based on various statistical metrics and several statistical comparison tests. In comparison with statistical, machine learning, and deep learning methods, our proposed XEWNet performs better in 75% of the cases for short-term and long-term forecasting of dengue incidence.

translated by 谷歌翻译

MAIL: Malware Analysis Intermediate Language

Shahid Alam

分类：自然语言处理

2022-11-06

This paper introduces and presents a new language named MAIL (Malware Analysis Intermediate Language). MAIL is basically used for building malware analysis and detection tools. MAIL provides an abstract representation of an assembly program and hence the ability of a tool to automate malware analysis and detection. By translating binaries compiled for different platforms to MAIL, a tool can achieve platform independence. Each MAIL statement is annotated with patterns that can be used by a tool to optimize malware analysis and detection.

translated by 谷歌翻译

MEDS-Net: Self-Distilled Multi-Encoders Network with Bi-Direction Maximum Intensity projections for Lung Nodule Detection

Muhammad Usman , Azka Rehman , Abdullah Shahid , Siddique Latif , Shi Sub Byon , Byoung Dai Lee , Sung Hyun Kim , Byung il Lee , Yeong Gil Shin

分类：计算机视觉

2022-10-30

In this study, we propose a lung nodule detection scheme which fully incorporates the clinic workflow of radiologists. Particularly, we exploit Bi-Directional Maximum intensity projection (MIP) images of various thicknesses (i.e., 3, 5 and 10mm) along with a 3D patch of CT scan, consisting of 10 adjacent slices to feed into self-distillation-based Multi-Encoders Network (MEDS-Net). The proposed architecture first condenses 3D patch input to three channels by using a dense block which consists of dense units which effectively examine the nodule presence from 2D axial slices. This condensed information, along with the forward and backward MIP images, is fed to three different encoders to learn the most meaningful representation, which is forwarded into the decoded block at various levels. At the decoder block, we employ a self-distillation mechanism by connecting the distillation block, which contains five lung nodule detectors. It helps to expedite the convergence and improves the learning ability of the proposed architecture. Finally, the proposed scheme reduces the false positives by complementing the main detector with auxiliary detectors. The proposed scheme has been rigorously evaluated on 888 scans of LUNA16 dataset and obtained a CPM score of 93.6\%. The results demonstrate that incorporating of bi-direction MIP images enables MEDS-Net to effectively distinguish nodules from surroundings which help to achieve the sensitivity of 91.5% and 92.8% with false positives rate of 0.25 and 0.5 per scan, respectively.

translated by 谷歌翻译

Label Flipping Data Poisoning Attack Against Wearable Human Activity Recognition System

Abdur R. Shahid , Ahmed Imteaj , Peter Y. Wu , Diane A. Igoche , Tauhidul Alam

分类：机器学习

2022-08-17

人类活动识别（HAR）是使用有效的机器学习（ML）方法将传感器数据解释为人类运动的问题。 HAR系统依靠来自不受信任的用户的数据，使他们容易受到数据中毒攻击的影响。在中毒攻击中，攻击者操纵传感器读数以污染训练集，从而误导了har以产生错误的结果。本文介绍了针对HAR系统的标签翻转数据中毒攻击的设计，在数据收集阶段，传感器读数的标签发生了恶意更改。由于传感环境中的噪音和不确定性，这种攻击对识别系统构成了严重威胁。此外，当将活动识别模型部署在安全至关重要的应用中时，标记翻转攻击的脆弱性是危险的。本文阐明了如何通过基于智能手机的传感器数据收集应用程序在实践中进行攻击。据我们所知，这是一项较早的研究工作，它通过标签翻转中毒探索了攻击HAR模型。我们实施了提出的攻击并根据以下机器学习算法进行活动识别模型进行测试：多层感知器，决策树，随机森林和XGBoost。最后，我们评估了针对拟议攻击的基于K-Nearest邻居（KNN）的防御机制的有效性。

translated by 谷歌翻译

Granger Causality using Neural Networks

Samuel Horvath , Malik Shahid Sultan , Hernando Ombao

分类： (统计)机器学习 | 机器学习

2022-08-07

Granger因果关系（GC）检验是一种著名的统计假设检验，用于研究一个时期的过去是否影响了另一个时间的未来。它有助于回答一个问题序列是否有助于预测。 Granger因果关系检测的标准传统方法通常假设线性动力学，但是这种简化在许多现实世界应用中不存在，例如，神经科学或基因组学本质上是非线性的。在这种情况下，施加线性模型，例如向量自回旋（VAR）模型可能会导致对真正的Granger因果相互作用的不一致估计。机器学习（ML）可以学习数据集中的隐藏模式（DL）在学习复杂系统的非线性动力学方面表现出巨大的希望。 Tank等人的最新工作建议通过使用神经网络结合对可学习的权重的稀疏性惩罚来克服VAR模型中线性简化的问题。在这项工作中，我们基于Tank等人引入的想法。我们提出了几类新的模型，这些模型可以处理潜在的非线性。首先，我们介绍了学识渊博的内核var（lekvar）模型 - var模型的扩展，这些模型也学习了通过神经网络参数的内核。其次，我们表明可以通过脱钩的惩罚直接将滞后和单个时间序列的重要性分解。这种去耦提供了更好的缩放，并使我们可以将滞后选择嵌入RNN中。最后，我们提出了一种支持迷你批次的新培训算法，并且它与常用的自适应优化器（例如Adam）兼容。癫痫患者的电脑电图（EEG）数据研究了在19个EEG通道之前，期间和之后的GC演变。

translated by 谷歌翻译

Unsupervised Ensemble Based Deep Learning Approach for Attack Detection in IoT Network

Mir Shahnawaz Ahmed , Shahid Mehraj Shah

分类：机器学习

2022-07-16

物联网（物联网）通过通过互联网控制设备/事物来改变生活。物联网已为日常问题指定了许多智能解决方案，将网络物理系统（CPS）和其他经典领域转化为智能区域。构成物联网的大多数边缘设备具有极低的处理能力。为了降低物联网网络，攻击者可以利用这些设备进行各种网络攻击。此外，随着越来越多的物联网设备的添加，新的和未知威胁的潜力呈指数增长。因此，必须开发针对可以识别此类威胁的物联网网络的智能安全框架。在本文中，我们开发了一种无监督的集合学习模型，该模型能够从未标记的数据集中检测物联网中的新或未知攻击。系统生成的标记数据集用于训练深度学习模型以检测IoT网络攻击。此外，研究提出了一种特征选择机制，用于识别数据集中最相关的方面以检测攻击。该研究表明，建议的模型能够识别未标记的物联网网络数据集和DBN（深信念网络）的表现优于其他模型，检测准确性为97.5％，错误警报率为2.3％，当使用由标记的数据集进行培训时建议的方法。

translated by 谷歌翻译

On-device Synaptic Memory Consolidation using Fowler-Nordheim Quantum-tunneling

Mustafizur Rahman , Subhankar Bose , Shantanu Chakrabartty

分类：人工智能 | 计算机视觉 | 机器学习

2022-06-27

突触记忆巩固已被认为是支持神经形态人工智能（AI）系统中持续学习的关键机制之一。在这里，我们报告说，Fowler-Nordheim（FN）量子隧道设备可以实现突触存储器巩固，类似于通过算法合并模型（例如级联和弹性重量合并（EWC）模型）所能实现的。拟议的FN-Synapse不仅存储突触重量，而且还存储了Synapse在设备本身上的历史用法统计量。我们还表明，就突触寿命而言，FN合并的操作几乎是最佳的，并且我们证明了一个包含FN合成的网络在一个小基准测试持续学习任务上超过了可比的EWC网络。通过每次突触更新的Femtojoules的能量足迹，我们相信所提出的FN-Synapse为实施突触记忆巩固和持续学习提供了一种超能效率的方法。

translated by 谷歌翻译

Breast Cancer Classification using Deep Learned Features Boosted with Handcrafted Features

Unaiza Sajid , Dr. Rizwan Ahmed Khan , Dr. Shahid Munir Shah , Dr. Sheeraz Arif

分类：计算机视觉 | 机器学习

2022-06-26

乳腺癌是全球女性死亡的主要原因之一。如果在高级阶段检测到很难治疗，但是，早期发现可以显着增加生存机会，并改善数百万妇女的生活。鉴于乳腺癌的普遍流行，研究界提出早期检测，分类和诊断的框架至关重要。与医生协调的人工智能研究社区正在开发此类框架以自动化检测任务。随着研究活动的激增，加上大型数据集的可用性和增强的计算能力，预计AI框架结果将有助于更多的临床医生做出正确的预测。在本文中，提出了使用乳房X线照片对乳腺癌进行分类的新框架。所提出的框架结合了从新颖的卷积神经网络（CNN）功能中提取的强大特征，以及手工制作的功能，包括猪（定向梯度的直方图）和LBP（本地二进制图案）。在CBIS-DDSM数据集上获得的结果超过了技术状态。

translated by 谷歌翻译